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We present a comprehensive study of utility function of the minority game in its efficient 
regime. We develop an effective description of state of the game. For the payoff function 
g(x) = sgn(x) we explicitly represent the game as the Markov process and prove the 
finitness of number of states. We also demonstrate boundedness of the utility function. 
Using these facts we can explain all interesting observable features of the aggregated 
demand: appearance of strong fluctuations, their periodicity and existence of preferred 
levels. For another payoff, g(x) = x, the number of states is still finite and utility remains 
bounded but the number of states cannot be reduced and probabilities of states are not 
calculated. However, using properties of the utility and analysing the game in terms of 
de Bruijn graphs, we can also explain distinct peaks of demand and their frequencies. 

Keywords: Minority game, adaptive system, Markov process, de Bruijn graph 

1. Introduction 

Minority game (MG) was designed p] as a microscopic model of adaptive behaviour 
observed in multi-agent systems. The MG is a typical bottom-up construct and 
therefore usual definitions of the game first specify rules of behaviour for individu- 
als. Then, piecing together microscopic variables, one defines higher-order quantities 
characterizing grander systems. In some cases, however, other constructs are also 
possible, e.g. functions of state like score functions can be attributed to groups 
of agents without specifying agents individually (cf. ref. [2]). Despite simplicity of 
basic rules of taking decisions by agents, adaptive abilities and phenomenology of 
populations playing MGs appear to be surprisingly interesting and their properties 
are non-trivial [3] . Special studies were devoted to understanding of such functions 
like aggregated demand, market volatility, market occupancy etc. It was shown [H 
[5] that the MG exhibits different modes of behaviour, depending on the game pa- 
rameters: the random, cooperation and herd. The latter case is characterized by 
small strategy space compared to the overall number of agents. Following authors 
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of ref. [6] we prefer to call this regime efficient, because all players have all available 
information at their disposal. Our study of this regime is motivated by interesting 
phenomenology observed in numerical simulations and lack of satisfactory inter- 
pretations of them. For example, the aggregated demand exhibits large-amplitude 
oscillations [5] and periodicity in time [6J [7]. The crowd-anticrowd theory [H [9] 
presents acceptable explanation for oscillations but fails to deal with the periodic- 
ity. This issue was treated by the authors of ref. [7] and, more fruitfully, ref. [2]. 
The authors of ref. [2] introduced the concept of the state of the MG but limit their 
analysis to the reduced strategy space. 

In our previous work [10] we found, in different context, that the crucial role 
in explanation of observable behaviour in the MG is played by the utility function. 
Therefore in this paper we further exploit the utility to study phenomenology of 
MGs in their efficient regime. We find that the utility is bounded and the number 
of states is finite, and prove these facts for the payoff function g(x) = sgn(x). We 
can represent the game as a Markov process and we can substantially reduce the 
number of states and calculate their probabilities. Then such interesting features 
of demand like its strong inhomogeneity and presence of patterns in time can be 
easily interpreted. For other payoff functions, e.g. g(x) — x, the number of states 
cannot be reduced and distribution of utility remains irregular. In this case we 
cannot explicitly calculate probabilities of states. However, using the same general 
properties of the utility and representing the game as paths on de Bruijn diagrams, 
we can also explain strong fluctuations of demand and calculate their frequency. 

2. Formal definition of the minority game 

At each time step t, the n-th agent out of N (n = 1, . . . , N) takes an action a an (t) 
according to some strategy a n (t). The action a Qii ( t ) takes either of two values: —1 
or +1. An aggregated demand is defined 

N 



where a' n refers to the action according to the best strategy, as defined in eq. (J3]) 
below. Such defined A(t) is the difference between numbers of agents who choose 
the +1 and —1 actions. Agents do not know each other's actions but A(t) is known 
to all agents. The minority action a*(t) is determined from A(t) 



Each agent's memory is limited to m most recent winning, i.e. minority, decisions. 
Each agent has the same number S > 2 of devices, called strategies, used to predict 
the next minority action a*(t + 1). The s-th strategy of the n-th agent, a s n (s = 
1, . . . , S), is a function mapping the sequence /x of the last m winning decisions to 
this agent's action a Q s. Since there is P = 2 m possible realizations of fi, there is 
2 P possible strategies. At the beginning of the game each agent randomly draws S 




(1) 



a*{t) 



sgnA{t). 



(2) 
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strategies, according to a given distribution function p(n) : n — > A„, where A„ is a 
set consisting of S strategies for the n-th agent. 

Each strategy a s n , belonging to any of sets A„, is given a real- valued function 
U a s which quantifies the utility of the strategy: the more preferable strategy, the 
higher utility it has. Strategies with higher utilities are more likely chosen by agents. 

There are various choice policies. In the popular greedy policy each agent selects 
the strategy of the highest utility 

a' n (t) = arg max U a = (t). (3) 

If there are two or more strategies with the highest utility then one of them is 
chosen randomly. The highest-utility strategy used by the agent is called the 
active strategy, in contrast to passive strategies, unused at given moment. However, 
at any time all agents evaluate all their strategies, the active and passive ones. Each 
strategy a s n is given the payoff depending on its action a Q s 

R a s n (t) = -a a s n (t)g[A(t)}, (4) 

where g is an odd payoff function, e.g. the steplike g(x) = sgn(x) j4j, proportional 
g[x) = x or scaled proportional g{x) — x/N . The learning process corresponds to 
updating the utility for each strategy 

U a . n (t + l) = U < (t)+R a ,Jt), (5) 

such that every agent knows how good its strategies are. 

3. Phenomenology 

In order to examine MGs in the efficient regime, we performed a series of numerical 
simulations with different combinations of game parameters, and chosen three most 
representative cases: (m,N) = (1, 401), (2, 1601), (5, 1601), all with the number of 
strategies per agent S — 2. All three games are in the efficient mode. In the first 
two cases the condition NS ^> 2 P is fulfilled. In the third one it is not met and 
consequences of this fact will become clear later in the text. In all three experiments 
the full strategy space is used. 

The effective mode is often called symmetric phase in the literature (cf. e.g. ref. 
[llj ) which means that both actions are taken by the minority agents with the same 
frequency. 

Figs [U [2] and [3] present results for the steplike payoff function g(x) = sgn(a;): 
the time evolution of A(t), the autocorrelation function R(t) and the scatter plots 
of A(t + 2 • 2 m ) against A(t), respectively. The same results for the proportional 
payoff function g{x) = x are given in Figs 21 [5] and [BJ 

Even a fleeting glance at Figs [1] and Q] reveals regularities in A(t) for both 
payoff functions but more regular and distinct for g(x) — x. In this case their 
period increases with the memory length m and their maximal values are equal 
to the half of the population size N/2. This periodicity can be better seen using 



I 



I 



September 29, 2009 13:46 



4 K. Wawrzyniak and W. Wislicki 




Fig. 1. Time evolution of the aggregated demand A(t) for three combinations of the population 
size TV and agent memory m: TV = 401, m = 1 (left), TV = 1601, m = 2 (middle) and TV = 1601, 
m = 5 (right). Simulations were done for S = 2 and g(x) = sgn(x). Preferred values of A are 
visible for all three games. 
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Fig. 2. Autocorrelation function R(t) for three combinations of the population size TV and agent 
memory m: TV = 401, m = 1 (left), TV = 1601, m = 2 (middle) and N = 1601, m = 5 (right). 
Simulations were done for 5 = 2 and g(x) = sgn(x). The highest values of R are for r = 2 ■ 2 m , 
except for r = 0, for all games fulfilling the NS S> 2 P condition. 
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Fig. 3. Plots of the aggregated demand A(t + 2 ■ 2 m ) vs. Alt) for three combinations of the 
population size TV and agent memory m: N = 401, m = 1 (left), N = 1601, m = 2 (middle) and 
TV = 1601, m = 5 (right). Simulations were done for S = 2 and g(a;) = sgn(x). Apparent preferred 
levels of A(t) are seen as clusters of points. For m = 1 and m = 2 points tend to flock around 
diagonals indicating positive correlation for r = 2 • 2 m . 
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Fig. 4. Time evolution of the aggregated demand A(t) for three combinations of the population 
size TV and agent memory m: TV = 401, m = 1 (left), TV = 1601, m = 2 (middle) and TV = 1601, 
m = 5 (right). Simulations were done for 5 = 2 and g(x) = x. Preferred values of A are visible for 
all three games. 




Fig. 5. Autocorrelation function R(r) for three combinations of the population size TV and agent 
memory m: TV = 401, m = 1 (left), TV = 1601, m = 2 (middle) and N = 1601, m = 5 (right). 
Simulations were done for S = 2 and g{x) = x. The highest values of R are for r = 2 ■ 2 m , except 
for r = 0, for all games fulfilling the NS > 2 P condition. 



Fig. 6. Plots of the aggregated demand A(t + 2 ■ 2 m ) vs. A(t) for three combinations of the 
population size TV and agent memory m: TV = 401, m = 1 (left), TV = 1601, m = 2 (middle) and 
TV = 1601, m = 5 (right). Simulation was done for 5 = 2 and g(x) = x. For m = 1 and m = 2 
points tend to flock around diagonals, indicating positive correlation, but clusterization of points 
is not much pronounced. 
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autocorrelation function R(r) (cf. Figs [2] and [5]) where r is the correlation time. The 
autocorrelation R exhibits statistically periodic peaks with periods T = 2 ■ 2 m , as 
has been already observed in the efficient regime in refs [7l [2] . The autocorrelation 
is much less pronounced for games which do not meet the criterion NS 2 P , as 
seen in Figs [2] and H] (right). Relaxation of this criterion spoils periodicity of the 
aggregated demand. Similar observations can be done inspecting the A(t + 2 • 2 m ) 
vs. A(t) scatter plots in Figs [3] and [6] where points for games fulfilling NS ^> 2 P 
condition (left and middle panels in Figs [3] and [6|) are stronger flocked around 
diagonals. 

Another interesting feature of the aggregated demand, seen in the one- 
dimensional plots of A{t) and better in the two-dimensional plots A(t + 2 • 2 m ) 
vs. A(t), is an existence of preferred values of A. These preferred values show up 
as species in the two-dimensional plots. The species are better focused and more 
numerous for g(x) — sgn(x) (Fig. [3]) than for g(x) = x (Fig. [6|). 

Time evolution of the utility functions appears to be strongly mean-reverting 
processes, independently of the payoff function, as seen e.g. in Figs The more so, 
for the steplike payoff g(x) = sgn(x) the utility is bounded to rather narrow belt 
— 2 m < U(t) < 2 m , where here and in Fig. [7] U(t) stands for the utility for any 
strategy. The formal proof of this statement is given in chapter 5. This feature is 
observed for any N and S, provided the criterion NS 3> 2 P is met. 




Fig. 7. Trajectories of the utility function U(t) for all strategies of the MG with 5 = 2 and 
m = 1 and JV high enough to ensure the NS 3> 2 P regime. Two payoff functions are shown: 
the steplike g(x) = sgn(x) (left) and the proportional g(x) = x (right). Lines correspond to all 
different strategies. Note difference of vertical scales between panels. 



4. The concept of state 

Since the MG represents system with many degrees of freedom, dimesionality of 
states is expected to be large. In general, for each time step t, specification of state 
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x(t) consists of: 

A. The history of decisions n(t), 

B. The set of strategies of all agents {of^}*" 1 ^ 

C. The set of utilities for all strategies of all agents {U a a (t)}*~*' '"' S N , 

D. A function relating strategics to agents: p(n) : n — > A„. 

Although the history of decisions fi(t) partially stores information about the past 
of the process, transition probabilities depend only on the present state and the 
process is Markovian. 

Substantial reduction of the number of state parameters and simplification of 
state description are possible in our case. Agents can use identical strategies @. 
Expected number of identical strategies in the whole population behaves asymp- 
totically, for N — > oo, like NS/2 P . The condition NS 3> 2 P assures that the game 
stays in that asymptotic regime and the number of identical strategies is close to 
its asymptotic expected value. Identical strategies have the same utilities over the 
whole game, provided the initial values of strategies are the same, e.g. U(0) = 0, 
for all strategies. It is thus enough to take into account only reduced set of pairwise 
different strategies {A}f=i an d utilities defined on them: 

Concerning point D, it is sufficient to find probabilities for agents to have strategies 
from the set of pairwise different strategies. The probability that given agent has 
any particular strategy from this set is equal to 1 — (1 — 1/2 P ) S . For large N, the 
number of agents having this strategy is equal to N(l — (1 — 1/2 P ) S ). Therefore 
point D, i.e. a function ascribing strategies to agents, corresponding to the agent 
grouping tensor of ref. [2] , can be dropped out entirely in our case. 

Finally, we describe states using fi(t) and the set of utilities for the complete set 
of 2 P pairwise different strategies {/3,}? =1 : 

x{t) = [ M (t), £Mt),EM*),...,EWt)]. (6) 

Similar description of state was used in ref. [2]. There are, however, two im- 
portant differences between their description and ours: (i) the authors of ref. [5] 
introduce a functional map giving time evolution of the system in any regime, and 
(ii) they degenerate the game by following mean values of demand, thus making 
the process deterministic and Markovian, and retaining possibility to randomize it 
perturbatively. Contrary to them, we do not degenerate the game. We consider it 
as a stochastic Markov process and eventually calculate the probability measure on 
states for the stcplike payoff. 

a Two strategies are called different if their Hamming distance is not equal to zero. The number of 
pairwise different strategies is equal to 2 P . 
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Utilities {Up i (t)}| =1 , considered as functions of time, are called trajectories. In 
majority of cases and provided the number of observed time steps is large enough, 
strategies can be distinguished by their trajectories. The sufficient condition that 
all 2 P trajectories Up^t) (0 < t < to) are distinguishable at to is that all 2 m 
possible histories /i appear until then in a row. On the other hand, appearance of all 
histories fi until to, but not necessarily exclusively, represents a necessary condition 
of distinguishability for trajectories. Examples of MGs in the regime NS ^> 2 P are 
shown in Figs [7] where trajectories are plotted for m = 1 and S = 2 and for two 
payoff functions further studied in this paper: g(x) = sgn(x) and g(x) = x. 

5. Analysis of the minority game with payoff g(x) — sgn(cc) 
5.1. Finitness of the number of states 

In this chapter we demonstrate that for any t the utility for any strategy is bounded 
from the bottom and top: U min < U(t) < U max , where U min ( max ) = — ( + )2 m . 

Assume that at given time t two different strategies have the same utilities. 
From eqn ((5|) for the steplike payoff function it follows that after one time step 
these utilities can either differ by two units or remain the same. If the initial values 
of the utilities of all SN strategies at t = are the same and after r time steps at 
least one of them attains its extremal value, U m i n or U ma x, then the trajectories 
cover the set of 2 m + 1 values (cf. Fig. [3 left) 

U(r) e 

= { 2 ™ 2 m - 2, . . . , 2, 0, -2, . . . , -2 m + 2, -2 m }. (7) 

Possible evolution scenarios leading to the values U min ^ max ) can be designed by 
using transitions described in Appendix A. Using this notation we have u\ — U max 
and «2 m +i = U m i n . The number of different strategies characterized by the same ui 
is given by combinatorics as the number of trajectories starting from and ending 
at ui 

#{&:Uto=ut}=(j™y i=l,...,2 P - (8) 

The probability that the active strategy of the n-th agent a' n has utility ui is equal 
to 

v\u,(t)- Ul ] _/i-^<(*)<^L i = i (9) 

/ [U <W m\ -\ v[u<{t) < - V [ U<{t ) < «,], I > 1 

Using argumentation similar to that of ref. [9], but extended to the full strategy 
space, one finds that 

s 

V [U a , n (t) < ui] = J] [l - V [U a s n (t) > ui] 

>«.} ]« (10) 
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where, for t = r, 

#{A:^>M = EfcV (U) 

Denoting P max ( mi „) = V [U a > n (r) = t/ max ( min )] , one sees from eqn © that ;P ma:E > 
Vmin- We notice that for any utility ui, different than U m i n or U max , the number 
of different strategies (jHJ is even. Even more, a half of strategies corresponding 
to each level U m i n < ui < U max suggest the opposite action than another half. 
According to eqn ([9|), if two (or more) strategies have the same utility, then all 
have the same probability to be the best strategies for the n-th agent. This means 
that, if one excludes the best and the worst strategies, a half of remaining strategies 
recommends the same action as the best or the worst strategy. Hence the probability 
that an agent plays according to the strategy suggesting the same action as the best 
strategy is equal to 

[^q^ ('7") ^a B (^~)] T~*max ^ (l T~*max T^min) 

where ct B (t) is the best strategy from the whole set of strategies in the game, i.e. 
U a B( t ) = Ui, and 1 — Vmax — Vmin refers to the probability that the agent's best 
strategy is neither the worst nor the best of all strategies. The factor | reflects 
that a half of strategies with non-extremal utilities suggest the same action as the 
best one. As T m ax > V m in, from eqn (fT2|) it follows that if one of strategies has 
the utility U max , then more than half of the population plays according to the best 
strategy. Subsequently, this subpopulation loose and gets the negative payoff. The 
rest are the winners and get the positive payoff. This mechanism bounds the utility 
to stay between U m in and U max . In addition, we know the formula for the fraction 
of agents playing the same action. 

5.2. Representation of the minority game as the Markov process 
5.2.1. Case m = 1 

In this case the complete specification of states and calculation of the transition 
matrix are relatively easy. All strategies are listed in Tab. [T] and states are listed in 
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Ct2 


a 3 


a 4 


-1 


-1 


-1 
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1 


-1 


1 


-1 


1 



Table 1. Strategies for m = 1 



Tab. [21 At the beginning of the game we assume no a priori knowledge, so that all 
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V(xi) 


EA(x t ) 
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-1 
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X3 
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-1 


-1 
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1 
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Xi 


-1 
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-1 
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-1 


1 
8 





x 5 


-1 





-2 


2 





1 

16 


-N 

8 JV 


x e 


1 





-2 


2 





1 

16 


--N 

8 


X 7 


1 


-2 
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1 
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-N 

8 JV 


x s 


-1 


2 








-2 


1 

16 


--N 
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X 9 


-1 


-1 


-1 


1 


1 


1 

16 






1 


1 


-1 


1 


-1 


1 

16 


\N 


x n 


-1 


1 


1 


-1 


-1 


1 

16 




X\2 


1 


-1 


1 


-1 


1 


1 

16 


-In 



Table 2. States x% (i = 1, .. . , 12), their probabilities V(xi) and demands for m = 1. The KA(lj) 
stands for the expected value of A for the state The V and E represent a priori values, i.e. before 
strategies are assigned to agents. After game initialization these values may become different and 
depend on realization of the game but the sequence of states is preserved. 



utilities are equal to zero, and two initial states are possible: X\ and x-i. For these 
two states the values of are different. Subsequent time evolutions depend on 
ratios between numbers of agents playing +1 or —1 actions and are illustrated in 
Figs QT] and described in detail in Appendix A. These states and transitions are 
sufficient to define a memoryless representation of the MG with a transition graph 
displayed in Fig. [H Some of its states have the same expected demand E A over 
realizations of the game, e.g. EA(xj) = (i = 1,...,4), as the same numbers of 
agents play according to strategies recommending opposite actions. Using formulas 
([9TfTT|) we can find E^4 for all states (cf. Tab. 2), consistently with observations in 
Fig. [3l where five clusters on the diagonal are found around values from Tab. [2l 

Our process is a stationary Markov chain for which the stationary Master Equa- 
tion can be solved with respect to the state probabilities. Their values are given in 
Tab. [21 in the column marked V(xi) (i = 1, . . . , 12). The state probabilities from 
Tab. [2] can be also used to find statistical periods of the demand 



V[A(t) = A(t + r)] = J] S[A( Xj (t + T)),A(xi(t))] 

ij 

■ V[xj(t + T)\xi(t)] -V[xi(t)], (13) 
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Fig. 8. Diagram of the Markov chain representation of the MG in the efficient regime for m = 1. 
If transitions to two states are possible from a given state, both a priori transition probabilities 
are the same. This happens for x\ — > X3 11, xi — > X4 12, £3 — ► X5 7 and £4 — > xg g. 

where 

1, x — v 

S( X> y)={ (14) 
I 0, otherwise 

The maximal value of 7/16 is found for r = 4 and this explains why the largest 
correlation is found also for t = 4. 

5.2.2. Case to > 1 

Any MG with to > 1 in the efficient regime can be represented as a Markov process 
with a finite number of states. The same method as for to = 1, but more demanding 
computationally, can be used to calculate state probabilities. 

6. Analysis of the minority game with payoff g(x) — x 

Contrary to the MG with the steplike payoff g(x) = sgn(x), in case of the propor- 
tional payoff g{x) = x the pairwise different strategies with identical utilities are 
unlikely (cf. Fig. [7j) . This means that the probabilities that the pairwise different 
strategies have the same utility is small compared to the case of g(x) = sgn(a:). Con- 
sequently, the probability that an agent has a freedom of choice of the next state is 
negligible for g(x) = x. This means that such game is in a sense less stochastic than 
for g(x) = sgn(x). Nevertheless, the game is still periodic because the number of 
states is finite. A persuasive explanation of periodicity is proposed by the authors of 
ref. [5] using de Bruijn representation of the memory sequences fi. Here we extend 
their analysis and explain peaks of A(t) and their frequency using two approaches 
based on the utility analysis. 
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6.1. First approach 

Dynamics of the MG can be efficiently studied using de Bruijn graphs, as shown in 
ref. [llj . The decision history (i(t) is a sequence of m minority actions 

= [a*(t - m),a*{t - m + I), . . . ,a*(t - I)]. (15) 

The fi(t + 1) is obtained by adding a*(t) to the right and deleting a*(t — m) from 
the left of the vector (fT5|) . such that there are two possible successors fi(t + 1) of 
fi(t). If one history can be obtained from another one using this procedure, then the 
latter has a directed edge to the former one. Histories may be represented by labeled 
edges. These rules define de Bruijn graph of the order m. Examples for m = 1 and 
m = 2 are given in Figs [HI 




Fig. 9. De Bruijn graphs of orders m = 1 (left) and m = 2 (right). Dashed lines represent examples 
of the Euler trails on the graph: one trail for m = 1 (left) and one of two possible Euler trails for 
m = 2 (right) 

Histories in MGs are not equiprobable [11]. Among all paths on the de Bruijn 
graph of the game, Euler paths define the shortest sequence of histories where each 
strategy looses and wins equally likely. In the non-Eulerian paths some histories are 
more frequent and therefore some strategies are more profitable. We show in the 
following that in the efficient mode the non-Eulerian paths are rare compared to 
the Eulerian ones. 

For the proportional payoff, prevalent number of strategies have unique utility. 
In such a case, the probability © for the active startegy a' n can be simplified (cf. 
also ref. [9]) 

V[U < {t)=ui] = ( 1 -^) 5 -( 1 -^) S I 1>1- ( 16 ) 

Consider the case when A is the largest possible. Since {ui} is a sorted list of 
utilities, this is possible if the first 1/2 strategies in this list suggest actions opposite 
to the last 1/2. Then the probability of an action suggested by the best strategy is 
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equal to 



2 t 



1=1 



=1-^- (17) 

This means that for large NS for about N(l — agents their active strategy is 
the same as the best strategy and the expected absolute value of the aggregated 
demand is equal to 

\ A \ = N ( 1 -^=x)- ( 18 ) 

In particular, if S = 2 then \A\ = N/2. 

There is also more fundamental reason that the order of strategies in the list ap- 
pears such that two halves of the list suggest opposite actions. We noticed that large 
fluctuation of A is only possible if the game is in one of two de Bruijn nodes called 
homogeneous, i.e. consisting of identical symbols: ^hi{2) = [ — ■ ■ ■ ■ ~(+)l] ■ I n ~ 
teresting enough, peaks are observable only after one of the homogenous histories, 
but not after both, as explained technically in Appendix B. 

Since high A(t) appears only after the history nc, we have just two transitions 
in the Eulerian path that starts from this history. From this it follows that the 
frequency of peaks is equal to 

/ = _?_ 

J 2 m + l 

= 2~rn' M 

in agreement with our simulations. The value 2 m+1 is the length of the Euler path 
and it corresponds to the period of A observed in Fig. [51 

Our argumentation becomes strict and eqn (|17[) is exact in the efficient mode 
when NS 2 P , ideally in the limit NS — > oo. But we also observe cyclic peaks of 
demand for N = 1601 and m = 5, when the efficiency condition is not met (cf. Fig. QJ 
right). In fact, the condition NS 3> 2 P can be slacken off to the requirement that 
the population is numerous enough that the game is in the herd mode. Games in 
that mode do not follow Eulerian paths because for smaller N the pool of strategies 
is too sparse and some histories occur more frequently. Nevertheless, the mechanism 
of peak creation is approximately preserved, as long as N is large enough to cause 
the split of utilities into two groups. 



6.2. Second approach 

At any time a somewhat simpler explanation may be given by dividing strategies 
into two categories: the good with the positive payoff, and bad with negative [TU] . 
Probability that an agent has no good strategies, or at least one good, is equal 
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1000 1050 1100 1150 1000 1050 1100 1150 
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Fig. 10. The time evolution of the aggregated demand (upper left) and utilities for three cases: 
an agent with one high- and one low- utility strategy (upper right), two high-utility strategies 
(lower left) and two low-utility strategies (lower right) at t = 1000. These three cases may be 
quantitatively distinguished using the values of utilities at t = 1000, corresponding to the location 
of the first maximum of A(t) in the upper left panel. Simulation was performed for the MG with 
N = 1601, S = 2, m = 5 and g(x) = x. 



to ^ and 1 — respectively. Rapid fluctuations of demand are transferred to 
similar fluctuations of the utility. The A(t\) fluctuates after the history [ic — A*(*i) 
when the strategies with higher utility indicate identical actions. If A(t\) strongly 
fluctuates, then at ii + l about N (1 — ^ ) agents have at least one strategy with high 
utility and they choose it. Strategies split into two groups of high and low utility 
with a gap between these two groups (cf. Fig. [13] in Appendix B). Strategies with 
high/low utility do not suggest the same actions, provided [i ^ /j,c, and therefore no 
peak of A is generated. The fic has a non- vanishing probability to reappear at some 
£2 > t\. All agents belonging to the group with at least one high- utility strategy tend 
to react identically and A(ta) fluctuates maximally, i.e. Ait^) = N(l — 2 /_i ). This is 
illustrated in Fig. [TO] (upper left), where for S = 2 we have A(t = 1000) = § . At t 2 , 
all strategies with high Ufa) fail and get the penalty — Afo), whereas those with 
low U(t2) are rewarded with A(t 2 ). After ti agents are divided into three groups, 
provided S — 2: the group with two good strategies, with one good and one bad, and 
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with two bad. As seen in Fig. [TQl at t = 1000 a quarter of the population with two 
high- utility strategies evolves into two low-utility group (lower left), and vice versa 
for another quarter with two initially low-utility strategies (lower right). Remaining 
half of the population just swaps utilities of their strategies (upper right). 

Results showing periodicity of A(t) from simulations become closer to the the- 
oretical results for large NS/2 P ratio. If it is small, then the game hardly follows 
the Eulerian path and peaks of A(t) appear randomly. 

7. Stochasticity of the game depends on initial conditions 

We assumed that U a s(t = 0) = for all a s n . This assumption seems natural as 
reflecting no a priori preference for any strategy. However, it appears to be critical 
for the MG dynamics for g(x) = sgn(x). Stochastic transitions mentioned in chap- 
ter 5.2 show up for the degenerate state, i.e. more than one strategy with the same 
utility. Removing this ambiguity suppresses stochasticity and the game becomes 
deterministic. In such a case, our simplified description of the state fails because 
strategies have unique utilities and cannot be aggregated. Consequently, the Marko- 
vian treatment is no longer useful but its description in terms of de Bruijn graphs 
becomes interesting. In particular, the game follows the Eulerian path on de Bruijn 
graph. In case of the proportional payoff g(x) — x, the game is just deterministic 
and follows one of the Eulerian paths. 

8. Conclusions 

We studied the MG in the efficient mode. We observe interesting collectivity in 
agent behaviour in this mode. Depending on the payoff function g(x), the game is 
driven by different dynamics which requires different methods of the analysis. In case 
g(x) = sgn(x), provided the population N is large enough to assure NS 2 P , the 
MG can be described in terms of the Markov process with the finite number of states, 
where transitions may be both stochastic and deterministic. This representation 
completely defines dynamics of the game in the stationary regime and allows for the 
calculation of state occupancies and other observables. The Markov representation 
provides with an explanation of the periodicity and preferred levels of the aggregate 
demand A(t). In practical terms this approach is tough for m > 1 due to the large 
number of states. We failed to find any relation between the memory length m 
and total number of states. Neither the simplified concept of state nor the Markov 
process description are valid if the initial preference is given to any strategy. 

For the proportional payoff g(x) — x, stochasticity of transitions disappears but 
one still observes periodicity. One also observes distinct peaks of the aggregated 
demand, exhibiting height equal to a half of the population, assuming S = 2. In 
the herd regime, there always exists a history fic for which 1 — ^ of agents react 
identically and this is seen in the peak A(t) = N(l — 2 g-i )■ We provided with two 
compatible explanations of these phenomena. The first uses the ordered list {ui} of 
2 P strategies and is similar to the reasoning for g(x) — sgn(x). The second approach 
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is a simplification of the first one to the case when only two classes of strategies are 
used instead of all 2 P classes. The second approach was also successfully exploited 
in our analysis of the multi- market minority game [lOj . 

We studied games with full strategy space. Some authors, e.g. refs [HE], re- 
duce strategy space and reproduce many features of the full MG, e.g. behaviour of 
<j{A) 2 /N . This trick, however, has serious drawbacks since it reduces the number 
of states in the Markov description of the game and significantly affects its time 
evolution. For g{x) = sgn(x), the Markov representation is oversimplified by such 
reduction. 

It this work we focused on theoretical issues of the MG with real histories. We 
did not elaborate on application of our model to real-life systems, as e.g. financial 
markets. At the moment, applications are more discussed by other authors [H2 
\TE\ \T7\ ITS] . Perhaps the most general mathematical description of MGs with real 
histories is given in ref. 19J using the generating functional approach. 
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Appendix A. Transition scenarios for m = 1 minority game with 
g(x) = sgn(x) 

Possible transition scenarios for the m = 1 MG, represented as the Markov chain, 
are illustrated in Figs [TTJ At the beginning of the game all utilities are equal to 
zero. Depending on the history /x, only two initial states can exist: x\ = [—1, 0, 0, 0] 
and x 2 — [1,0, 0, 0]. For each of these two states two further scenarios are equally 
possible, because the utilities of corresponding strategies are the same. The choice 
depends on the ratio between numbers of agents in two groups: one with a — 1 and 
another one with a = — 1. These scenarios are as follows. 

Transition 1 

Being in the state x\, the majority of agents use strategies suggesting a — 
-1. Then 

— the minority action in the next step is a* = 1, 

— strategies ot\ or a.2 give negative payoff, 

— strategies and give positive payoff. 

The system goes to the state 23 = [1, —1, —1, 1, 1] (cf. Fig. [Til Transition 1) 
where U a3 = U a4 = 1 and these strategies suggest different actions on the 
last history fj, = 1. Similarly, there are two strategies with the utilities 
U ai = U a2 = — 1 suggesting different actions on [i = 1. Hence, there are 
two equiprobable scenarios, further described as Transitions 3 and 4. 
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Fig. 11. Trajectories of utilities for m = 1. 



Transition 2 

Being in the state aii, the majority of agents use strategies suggesting a = 1. 
Then 

— the minority action in the next step is a* = — 1, 

— strategies or 0:4 give negative payoff, 

— strategies ai and an give positive payoff. 

The system goes to the state X\\ = [—1,1,1,-1,-1] (cf. Fig. QTJ Transi- 
tion 2) where U ai — U a2 = 1 and give the same actions on the last history 
fi = — 1. Most of agents use these strategies (e.g. 3/4 of the population, 
provided 5 = 2) and the sole possibility is that the system goes to the state 

X2- 

Transition 3 

Being in the state X3, the majority of agents use strategies suggesting a = 1 
and the system passes to £5. In this state U aa = U max and U a2 = U m i n 
(cf. Fig. [TTl Transition 3). According to the reasoning from section 5.1, if 
one utility attains its maximal or minimal value, most agents use strategics 
suggesting the same action as the best strategy. Consequently, there is only 
one scenario possible in x§: the best strategy, and all strategies giving the 
same output as the best one, loose and the system goes to the state X4. 



Transition 4 
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Another possibility in x% is that most of agents decide a = — 1 and the 
system goes to x-j. In this state U ai = U max and U ai = U m i n (cf. Fig. [TTJ 
Transition 4) . Subsequently, the best strategy, and all strategies giving the 
same output as the best one, loose and the system goes to the state x§. In 
ccg both best strategies suggest the same for the last history /i = — 1. The 
majority of the population uses one of these best strategies and the system 
moves to x\. 

Transition 5-8 

These transitions are analogical to Transitions 1-4, but the initial state is 
x 2 . 



Appendix B. Algorithm generating strong demand fluctuations 

In Fig. [12] we present the flow chart illustrating appearance of strong fluctuations 
of A(t). Below we describe the algorithm step by step. First three stages lead to the 
first peak. Next steps explain why the subsequent peaks follow each other and why 
they have opposite signs. 

Stage 1 

If A(t±) stands for the first peak of demand then three prior conditions 
have to be fulfilled. The first is that ji(ti — 1) = fJ>hi(2)t where fJ>hi(2) — 
[— (+)1, . . . , — (+)1] is a homogeneous node. 

Stage 2 

It is also required that at t\ — 1 majority of agents decides to change the 
node. If this is fulfilled then the minority action is 

a*(tx - 1) = ( -1 ' "fr- 1 ^"". (B.l) 

y l, iu(ti - i) = 

Hence /z(£i) = fi(tx — 1), the minority action is to stay in the same node 
and gives the positive payoff to the winning strategy 

R al {t 1 -l) = -a otl A{tx-l). (B.2) 

Stage 3 

There is a non-zero probability that strategies corresponding to the first 
1/2 utilities in {u/} have won in the last step. Such circumstance is possible 
provided stages 1 and 2 are realized. If this third condition is fulfilled then 
we mark such history hq. Then all first 1/2 strategies suggest the same 
reaction after fic- Hence the majority decision at t\ is to stay in the node 
and the maximal demand (|18[) is generated. All strategies with high utility 
get the penalty and the low-utility ones are rewarded by the same amount. 
The game follows the minority decision and escapes from the \ic de Bruijn 
node. When the game leaves nc, the strategy set is split into two groups 
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Fig. 12. The flow chart of the MG evolution algorithm, illustrating appearance of distinct peaks 
of demand. 
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of high and low utility, as illustrated in Fig. [131 In the next steps the game 
goes to fi ^ \i C - 




Stage 4 

Next steps do not substantially affect utilities as long as the history fic 
does not reappear. There is no history other than /xc assuring that the first 
1/2 strategies in the {u t } list suggest a collective action resulting with the 
most spiky demand. Hence, after t\, the variations of A do not affect the 
utility significantly untill the [ic reappears at t% > t\ when the set of the 
best 1/2 strategies is the same as at t±. Then the 1/2 best strategies suggest 
the game to shift to another node characterized by history fifo + 1) =/= fic 
and the maximal demand |A(t2)| = N(l — 2 /_ 1 ) is generated. All the 
1/2 best strategies get penalty proportional to the absolute value of the 
aggregated demand. Concurrently, the 1/2 strategies with the lowest utility 
are rewarded with the same amount (cf . Fig. 1 13() . 

Stage 5 

Next, the game follows the edge leading to the same node. Subsequently, 
the 1/2 best strategies suggest staying in the same vertex /xc- Again, high 
absolute value of demand is generated but the sign of A(t 2 + 1) is opposite 
to the sign of Afe)- Consequently all strategies with high U(t 2 + 1) get 
penalty N(l — 2 s-i ) and, concurrently, strategies with low utility get reward 
of the same size. 

Stage 6 

The game goes to the vertex \xc{^i + 2) ^ He and the scenario from stages 
4-6 repeats. 
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Appendix C. Symbol captions 



a* 
a 1 



EA( Xi ) 

/ 
9 
m 

li= [a*(t-m),...,a*(t-l)] 
Mc 

^1(2) = [-(+)!,-••, -(+)!] 
N 

P = 2 m 

^min(max) 

p(n) 

R a 

S 



u a 
u, 



min{max) 



= -(+)2 r 



x(t) = [ M (t), U 1 {t),...,U 2 p(t)\ 



- action suggested by strategy a 

- the minority action 

- the s-th strategy of the n-th agent 

- the active strategy, or the strategy of the highest utility 
for the n-th agent 

- the best strategy from the whole set of strategies in the game 

- aggregated demand 

- expected value of demand over possible realizations of the game 

- set of 2 P pairwise different strategies 

- set of S strategies of the n-th agent 

- frequency of demand peaks 

- payoff function 

- length of the sequence of last minority decisions 

- sequence of the last minority decisions 

- history of minority decisions preceding first strong fluctuation of demand 

- homogenous de Bruijn nodes 

- the total number of agents in the game 

- number of possible realizations of /i 

- probability that the minimal (maximal) utility of any agent attains 
the absolute minimum (maximum) value U m i n ( max ) 

- distribution of strategies for the n-th agent at the beginning of the game 

- payoff for the strategy a 

- the total number of strategies for each agent 

- ordered list of different utility values when the extremal value of 
U m in(max) i s attained 

- utility of the strategy a 

- the absolute minimum (maximum) value of the utility 

- state of the game at time t 
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